{
 "metadata": {
  "name": "",
  "signature": "sha256:6116fc753692e334327b47bf860e67da1a56d3d94315d831dccd09e1979b07bf"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<img src=http://continuum.io/media/img/continuum_analytics_logo.png align=\"right\" width=\"30%\">\n",
      "\n",
      "# Blaze Expressions\n",
      "\n",
      "Blaze expressions convey intent from the user.\n",
      "\n",
      "Blaze compute functions interpret from expressions to backend.\n",
      "\n",
      "The interface in this notebook is not intended for interactive use.  You may find *interactive expressions* (like `Data`) more useful for data exploration."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blaze import symbol, compute, join"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "bank = symbol('bank', '''1000 * {id: int, \n",
      "                                 name: string, \n",
      "                                 balance: int,\n",
      "                                 lastseen: datetime}''')\n",
      "bank  # no data to see here"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/mrocklin/Software/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:239: FormatterWarning: Exception in text/html formatter: Expression does not contain data resources\n",
        "  FormatterWarning,\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "bank"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "deadbeats = bank[bank.balance < 0][['name', 'lastseen']]\n",
      "deadbeats"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "bank[bank.balance < 0][['name', 'lastseen']]"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "deadbeats.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "dshape(\"var * {name: string, lastseen: datetime}\")"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Compute recipes\n",
      "\n",
      "Blaze interprets expressions against backends by consulting a repository of small recipes.\n",
      "\n",
      "We look at some simple recipes for Python, Pandas, and Spark"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "L = [[1, 'Alice',   100],\n",
      "     [2, 'Bob',    -200],\n",
      "     [3, 'Charlie', 300],\n",
      "     [4, 'Dennis',  400],\n",
      "     [5, 'Edith',  -500]]\n",
      "\n",
      "from pandas import DataFrame\n",
      "\n",
      "df = DataFrame([[1, 'Alice',   100],                         \n",
      "                [2, 'Bob',    -200],\n",
      "                [3, 'Charlie', 300],\n",
      "                [4, 'Denis',   400],\n",
      "                [5, 'Edith',  -500]], columns=['id', 'name', 'balance'])\n",
      "\n",
      "import pyspark\n",
      "\n",
      "sc = pyspark.SparkContext('local', 'blaze-app')\n",
      "rdd = sc.parallelize(L)\n",
      "\n",
      "bank = symbol('bank', '''1000 * {id: int, \n",
      "                                 name: string, \n",
      "                                 balance: int}''')\n",
      "\n",
      "deadbeats = bank[bank.balance < 0].name"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute(deadbeats, L)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "<itertools.chain at 0x7f1c47d14dd0>"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute(deadbeats, df)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "1      Bob\n",
        "4    Edith\n",
        "Name: name, dtype: object"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute(deadbeats, rdd)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "PythonRDD[1] at RDD at PythonRDD.scala:43"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## How Blaze handles numeric evaluation\n",
      "\n",
      "*or, how to stay sane while trying to engage the entire Python ecosystem*"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blaze.compute.core import compute_up"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute_up.source(bank.head(), df)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "File: /home/mrocklin/workspace/blaze/blaze/compute/pandas.py\n",
        "\n",
        "@dispatch(Head, (Series, DataFrame))\n",
        "def compute_up(t, df, **kwargs):\n",
        "    return df.head(t.n)\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute_up.source(bank.head(), L)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "File: /home/mrocklin/workspace/blaze/blaze/compute/python.py\n",
        "\n",
        "@dispatch(Head, Sequence)\n",
        "def compute_up(t, seq, **kwargs):\n",
        "    if t.n < 100:\n",
        "        return tuple(take(t.n, seq))\n",
        "    else:\n",
        "        return take(t.n, seq)\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "compute_up.source(bank.head(), rdd)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "File: /home/mrocklin/workspace/blaze/blaze/compute/spark.py\n",
        "\n",
        "@dispatch(Head, RDD)\n",
        "def compute_up(t, rdd, **kwargs):\n",
        "    return rdd.take(t.n)\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### N-Dimensional example"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = symbol('x', '1000 * 1000 * {measurement: float32, timestamp: datetime}')\n",
      "x"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "x"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "x"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "expr = x[:500].measurement.sum(axis=1)\n",
      "expr"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 14,
       "text": [
        "sum(x[:500].measurement, axis=(1,))"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "expr.dshape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "dshape(\"500 * float32\")"
       ]
      }
     ],
     "prompt_number": 15
    }
   ],
   "metadata": {}
  }
 ]
}